Combining Rules and CRF Learning for Opinion Source Identification in Spanish Texts
نویسندگان
چکیده
In this work we present a system for the automatic annotation of opinions in Spanish texts. We focus mainly in the definition of a TFS-style model for the predicates of opinion and their arguments, in the creation of a lexicon of opinion predicates and in two additional variants for identifying the source of opinions. The original system extracts opinions and all its elements (predicate, source, topic and message) based on hand-coded rules, the first variant uses a CRF model for learning the source, assuming that the predicate is already tagged, and the second variant is a combined version, with the result of source recognition via the rule-based system being added as an additional attribute for training the CRF model. We found that this hybrid system performs better than each of the systems evaluated separately. This work involved the construction of several resources for Spanish: a lexicon of opinion predicates, a 13,000 word corpus with whole opinion annotations and a 40,000 word corpus with annotations of opinion predicates and sources.
منابع مشابه
Opinion Identification in Spanish Texts
We present our work on the identification of opinions and its components: the source, the topic and the message. We describe a rule-based system for which we achieved a recall of 74% and a precision of 94%. Experimentation with machine-learning techniques for the same task is currently underway.
متن کاملFactuality Annotation and Learning in Spanish Texts
We present a proposal for the annotation of factuality of event mentions in Spanish texts and a free available annotated corpus. Our factuality model aims to capture a pragmatic notion of factuality, trying to reflect a casual reader judgements about the realis / irrealis status of mentioned events. Also, some learning experiments (SVM and CRF) have been held, showing encouraging results.
متن کاملAutomated Rule Selection for Aspect Extraction in Opinion Mining
Aspect extraction aims to extract fine-grained opinion targets from opinion texts. Recent work has shown that the syntactical approach, which employs rules about grammar dependency relations between opinion words and aspects, performs quite well. This approach is highly desirable in practice because it is unsupervised and domain independent. However, the rules need to be carefully selected and ...
متن کاملIdentifying Prepositional Phrases in Chinese Patent Texts with Rule-based and CRF Methods
Identification of prepositional phrases (PP) has been an issue in the field of Natural Language Processing (NLP). In this paper, towards Chinese patent texts, we present a rule-based method and a CRF-based method to identify the PPs. In the rule-based method, according to the special features and expressions of PPs, we manually write targeted formal identification rules; in the CRF approach, af...
متن کاملIdentification of Opinion Holders
Opinion holder identification aims to extract entities that express opinions in sentences. In this paper, opinion holder identification is divided into two subtasks: author’s opinion recognition and opinion holder labeling. Support vector machine (SVM) is adopted to recognize author’s opinions, and conditional random field algorithm (CRF) is utilized to label opinion holders. New features are p...
متن کامل